Statistics for Data Science II
\ln \left( \frac{\pi}{1-\pi} \right) = \beta_0 + \beta_1 x_1 + ... + \beta_k x_k
\ln \left( \frac{\pi_1 + ... + \pi_j }{\pi_{j+1} + ... + \pi_{c}} \right) = \hat{\beta}_{0j} + \hat{\beta}_{1} x_1 + ... + \hat{\beta}_{k} x_k
\ln \left( \frac{\pi_j}{\pi_{\text{ref}}} \right) = \hat{\beta}_{0j} + \hat{\beta}_{1j} x_1 + ... + \hat{\beta}_{kj} x_k
Like in ordinal logistic regression, we will create c-1 models.
Unlike ordinal logistic regression, we no longer assume proportional odds.
This means that we now have different slopes for each model constructed.
\ln \left( \frac{\pi_j}{\pi_{\text{ref}}} \right) = \hat{\beta}_{0j} + \hat{\beta}_{1j} x_1 + ... + \hat{\beta}_{kj} x_k
multinom() function from the nnet package.Let’s consider data from a General Social Survey, relating political ideology to political party affiliation. Political ideology has a five-point ordinal scale, ranging from very liberal (Y=1) to very conservative (Y=5). Let x be an indicator variable for political party, with x = 1 for Democrats and x = 0 for Republicans. We will construct an ordinal logistic regression model that models political ideology as a function of political party and sex.
# weights: 20 (12 variable)
initial value 1343.880657
iter 10 value 1234.606706
final value 1231.166030
converged
Call:
multinom(formula = Ideology ~ Party + Sex, data = gss)
Coefficients:
(Intercept) PartyRepublican SexMale
2 - Liberal 0.06688293 0.4241904 -0.13565500
3 - Moderate 0.89856922 0.8609859 -0.37175538
4 - Conservative -0.73965106 1.6869369 0.16265398
5 - Very Conservative -0.40728933 1.5633886 0.07632974
Std. Errors:
(Intercept) PartyRepublican SexMale
2 - Liberal 0.1902369 0.2833214 0.2645717
3 - Moderate 0.1623610 0.2426818 0.2272924
4 - Conservative 0.2255970 0.2871522 0.2694358
5 - Very Conservative 0.2067078 0.2727859 0.2572010
Residual Deviance: 2462.332
AIC: 2486.332
\begin{align*} \ln \left( \frac{\pi_{\text{Lib}}}{\pi_{\text{V. Lib}}} \right) &= 0.07 + 0.42 \text{ republican} - 0.14 \text{ male} \\ \ln \left( \frac{\pi_{\text{Mod}}}{\pi_{\text{V. Lib}}} \right) &= 0.90 + 0.86 \text{ republican} - 0.37 \text{ male} \\ \ln \left( \frac{\pi_{\text{Cons}}}{\pi_{\text{V. Lib}}} \right) &= -0.74 + 1.68 \text{ republican} + 0.16 \text{ male} \\ \ln \left( \frac{\pi_{\text{V. Cons}}}{\pi_{\text{V. Lib}}} \right) &= -0.41 + 1.56 \text{ republican} + 0.08 \text{ male} \end{align*}
For a one [predictor unit] increase in [predictor], the odds of [response category j], as compared to [the reference category], are multiplied by e^{\hat{\beta}_i}.
For a one [predictor unit] increase in [predictor], the odds of [response category j], as compared to [the reference category], are [increased or decreased] by [100(e^{\hat{\beta}_i}-1)% or 100(1-e^{\hat{\beta}_i})%].
As compared to [reference category of predictor], the odds of [response category j], as compared to [reference category of outcome], for [predictor category of interest] are multiplied by e^{\hat{\beta}_i}.
As compared to [reference category of predictor], the odds of [response category j], as compared to [reference category of outcome], for [predictor category of interest] are [increased or decreased] by [100(e^{\hat{\beta}_i}-1)% or 100(1-e^{\hat{\beta}_i})%].
(Intercept) PartyRepublican SexMale
2 - Liberal 1.07 1.53 0.87
3 - Moderate 2.46 2.37 0.69
4 - Conservative 0.48 5.40 1.18
5 - Very Conservative 0.67 4.77 1.08
Specific: As compared to someone who identifies as a democrat, someone who identifies as republican has a 377% increase in the odds of saying their political ideology is very conservative as compared to very liberal.
More general: As compared to identifying as having a very liberal political ideology, those that identify as republican have increased odds of reporting more conservative political ideologies.
summary() do not include hypothesis test results.z <- summary(m1)$coefficients/summary(m1)$standard.errors # construct z
p <- (1 - pnorm(abs(z)))*2 # construct p-values
t(p) # transpose to columns 2 - Liberal 3 - Moderate 4 - Conservative 5 - Very Conservative
(Intercept) 0.7251555 3.123147e-08 1.043091e-03 4.879684e-02
PartyRepublican 0.1343397 3.884666e-04 4.235763e-09 9.972635e-09
SexMale 0.6081371 1.019271e-01 5.460540e-01 7.666416e-01
Globally, only political party is a significant predictor of political ideology (p < 0.001).
This holds true when comparing moderate (p < 0.001), conservative (p < 0.001), and very conservative (p < 0.001) to very liberal.
What if we are interested in comparing against, say, moderate ideology?
Global significance will not change.
Model level significance will change.
Like before, we can construct confidence intervals using the confint() function.
We, of course, want the confidence intervals of the odds ratios.
, , 2 - Liberal
2.5 % 97.5 %
(Intercept) 0.74 1.55
PartyRepublican 0.88 2.66
SexMale 0.52 1.47
, , 3 - Moderate
2.5 % 97.5 %
(Intercept) 1.79 3.38
PartyRepublican 1.47 3.81
SexMale 0.44 1.08
, , 4 - Conservative
2.5 % 97.5 %
(Intercept) 0.31 0.74
PartyRepublican 3.08 9.49
SexMale 0.69 2.00
, , 5 - Very Conservative
2.5 % 97.5 %
(Intercept) 0.44 1.00
PartyRepublican 2.80 8.15
SexMale 0.65 1.79
We have now covered logistic regression for all types of categorical outcomes.
Two responses \to binary logistic regression.
More than two ordered* responses \to ordinal logistic regression.
More than two responses \to nominal logistic regression.
Note that we have learned the models with a logit link function.
We can also use probit and complementary log log (cloglog) link functions.
You can read a discussion about the differences on stack overflow